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Abstract 

Data distribution, data replication, and system reliability are key fac- 
tors in determining the availability measures for transactions in distributed 
database systems. In order to simplify the evaluation of these measures, 
database designers and researchers tend to make unrealistic assumptions 
about these factors. In this paper, we investigate the effect of such assump- 
tions on the computational complexity and accuracy of such evaluations. We 
represent a database system with five parameters related to the above fac- 
tors. Probabilistic analysis is employed to evaluate the availability of read- 
only and read-write transactions. We consider both the read-one/write-all 
and the majority-read/majority-write replication control policies. We con- 
clude that transaction availability is more sensitive to variations in degrees 
of replication, less sensitive to data distribution, and insensitive to reliabil- 
ity variations in a heterogeneous system. The computational complexity of 
the evaluations is found to be mainly determined by the chosen distributed 
database model, while the accuracy of the results are not so much dependent 
on the models. 

Keywords and phrases: Availability, Database models, Distributed 
Systems, Distributed Database Systems, Performance Evaluation, Proba- 
bilistic Analysis, Reliability, Transaction Availability 
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1 Introduction 

A distributed database system is a collection of cooperating nodes each 
containing a set of data objects F A user transaction can enter such a 
system at any of these nodes. The receiving node, some times referred to 
as the coordinating or initiating node, undertakes the task of locating the 
nodes that contain the data objects required by a transaction. 

When we consider systems that require high guarantees for successful 
execution of transactions (especially for read-only transactions), it is impor- 
tant to consider transaction availability. Even though there are a number 
of availability (and reliability) metrics defined for computer systems [2,9], 
in this paper we choose two metrics! Start availability (TSA) and finish 
availability (TFA). 

Transaction start availability (TSA) defines the probability with which 
a transaction can successfully start its execution. By our definition, a trans- 
action is said to have a successful start when it can access all the required 2 
copies of the data objects that it needs for its execution. For simplicity, we 
consider a data copy at a node to be available for access when that node 
is up and it is accessible from the node that is currently coordinating the 
execution of the transaction. A transaction can start its execution as soon 
as all the required data object copies are available. 

Transaction finish availability ( TFA) defines the probability with which a 
transaction can complete its execution, given that it has started its execution 
successfully. If execution times for transactions are negligible (as compared 
to the mean- time- to- fail of the components), then this reliability will be 
close to L However, since transactions take a finite but significant amount 
of time to execute, it is quite possible that the nodes that are involved in 
the execution of a transaction (and available at the start of execution) may 

Mn this paper, the basic unit of access in a database is referred to as a data object. 

2 The number of copies of an object that are required to be accessed by a transaction 
depends on the operation (read or write) and the replica copy control (e.g. read-one/write- 
all, majority) [3,18]. 
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fail during its execution. In this case, the transaction is said to be aborted. 
In such cases, the execution needs to be restarted. 

Formal definitions and evaluation of these two metrics (TSA and TFA) 
depend on several factors such as the fault model of the system (including the 
reliabilities of the system components), the transaction execution policy, the 
data distribution policy, the degree of data replication, the concurrency and 
commit protocols, and the characteristics of the given transaction [4,7,9]. In 
addition, TFA depends on the execution times of transactions. 

Even though it is theoretically possible to formulate equations expressing 
the two metrics in terms of the above mentioned factors, the evaluation of 
these equations is extremely cumbersome and requires unreasonably high 
computation times. The evaluation of the exact values for these measures 
generally involves both analysis and simulation. Evaluation tools with such 
large execution times are certainly not acceptable to a database designer 
who needs to evaluate a number of such possible database configurations 
before arriving at a final design. 

To overcome these problems, designers and researchers generally resort 
to approximation techniques [7,8,16]. These techniques reduce the compu- 
tation time by making simplifying assumptions regarding data distribution, 
data replication, and transaction execution. The time complexity of these 
techniques primarily depends on the underlying model as well as the evalu- 
ation technique. 

The effect of data distribution and replication models on evaluation of 
transaction response time has been measured in earlier studies [13]. These 
studies indicate that the computational complexity of a selected database 
model does not necessarily reflect the accuracy of the resulting performance 
evaluations. In fact, a model requiring computational time of 3 0(n 2 ) has 
yielded results very close to those from a complex model with 0(n n ) com- 
plexity. 

In this paper, we study the effect of data distribution, data replication, 
and fault models on the accuracy of transaction availability evaluations. We 
employ probabilistic analysis to arrive at the estimates for the desired values 
for six typical models. 

The balance of this paper is outlined as follows. Section 2 formally 

3 Here, n denotes the number of nodes in a distributed system. 
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defines the problem under consideration. In Section 3, we describe a clas- 
sification scheme for data distribution and replication policies. Section 4 
illustrates the advantages of probabilistic analysis over simulation, and em- 
ploys this technique to evaluate the measures for two different models. In 
Section 5, we compare the analysis methods for six models based on com- 
putational complexity, space complexity, and the accuracy of the measures. 
Finally, in Section 6, we summarize the obtained results, and suggest a 
general approach for design and analysis of these systems. 


2 Problem Description 

In this paper, a read-only transaction is characterized by the average number 
of data objects that it reads (i.e., its read-set size). Similarly, a read-write 
transaction is characterized by the number of data objects that it reads 
(read-set size), and the number of data objects that it updates (write-set 
size). 

The problem of estimating the availability of a read-only transaction 
may be formulated as: 

Given the following parameters, estimate TSA , and T F A, for a read- 
only transaction that requires s data objects for read access. 

• n, the number of nodes in the database 4 

• N, the index set for the nodes in the database; N = {1,2,. . . , n} 

• d, the number of data objects in the database 

• D, the index set for the data objects in the database; D = 

{ 1 , 2 ,. ..,<*} 

• GD, the global data directory that contains the location of each of 
the d data objects; the GD matrix contains d rows and n columns, 
each of which is either a 0 or a 1; i.e., GDij = 0 or 1, Vi € 
D and Vj £ N 

• the reliability of the nodes in the network. 

The problem of estimating the metrics for a read-write transaction can 
be similarly defined. 

4 TabIe 1 summarizes the notation used in this paper 
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Symbol Description 

di The number of data objects accessed from the i tk group 

c The average number of copies of a data object 

ci The number of copies of a data object in the class 

d The number of data objects in the database 

d t The number of data objects in the i th class 

g The number of data object groups 

k Number of live nodes 

n Number of nodes 

n, The number of nodes in the i th class 

p The number of copy classes 

q The number of reliability classes 

r The average node reliability 

r, The reliability of a node in the i th class 

5 The size of the read-set 

A\, A2 Policies representing the data grouping 

B\ y B2 Policies representing limits on the data objects per node 

Ci,C 2 Policies representing the degree of replication 

D \ , Z?2 Policies representing the copy distribution 

E\, E2 Policies representing the component reliability 

D The index set for the data objects in the database 

GA Group access vector representing the number of objects accessed 

from each class or group 
GD Global data directory (or dictionary) 

N The index set for the nodes in the database 

TSA s Transaction start availability of a read-only transaction with 
read-set size s (read-one/ write-all policy) 

TSA* Transaction start availability of a read- write transaction with 

read-set size x + y and write-set size y (read-one/write-all policy) 
TSA " Transaction start availability of a transaction with 
read-set size s (read-majority/write-majority policy) 
x The size of the read-only object set 

y The size of the re ad-write object set 

Table 1 : Notation 
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3 Model Description 


As stated in the introduction, the primary objective of this paper is to in- 
vestigate the effect of data distribution, replication, and fault models on 
availability estimations and the computational complexity of these evalua- 
tions. 

To describe a data distribution, replication, and fault model, we charac- 
terize it with five orthogonal parameters: 

A - Object grouping (or clustering) 

B - Limits on the number of data objects per node 

C - Degree of object replication (or the number of copies) 

D - Constraints on distribution of object copies 

E - Constraints on component reliability 

We now discuss each of these parameters in detail. 

Some distributed database systems allocate individual data objects [5, 
10], We categorize this strategy as d,. In other systems, data objects 
are first partitioned into disjoint groups, and then the resulting groups are 
allocated [12,16,17], Thus, the copies of all the data objects in a given group 
are allocated to the same set of nodes. We refer to this strategy as A?- 

Some database designers place no explicit limit on the number of data 
objects that may be placed at a node [7]. This strategy is named as B\. 
Others restrict the number of data objects that may be placed at a given 
node. This may be attributed to storage limitations or for security reasons 
[11]. We refer to this strategy as B 2 . 

For simplicity, several analysis techniques assume that each data object 
has the same number of copies (or degree of replication) in the database 
system [6,16]. Some other techniques characterize the degree of replication 
of a database by the average degree of replication of data objects in that 
database [7]. In this paper, both these categories are referred to as Ci- 
Others treat the degree of replication of each data object independently. 
We refer to this as strategy C?. 
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Some database designers and analysts assume that each data object (or 
group) copy is randomly distributed among the nodes in the distributed 
system [7]. We refer to this as D\. Others assume some specific allocation 
schemes for data object (or group) copies [11]. Assuming complete knowl- 
edge of data copy distribution (GD) is one such assumption. Depending 
on the type of allocation, such assumptions may simplify the performance 
analysis [13]. This category is referred to as D 2 - 

Again for simplicity, some database designers and analysts assume that 
all components (nodes and links) in a distributed system have the same 
reliability factor [1]. In this paper, we only consider node failures and node 
repairs 5 . We let E\ denote a policy where all nodes are assumed to have 
the same reliability characteristics, and E 2 denote a policy where nodes are 
classified based on their reliability characteristics. 

Using this classification, any known data distribution, replication, and 
reliability policies may be categorized by these five parameters. For example, 
< A 2 y B u Ci,D 2 ,E\ > represents a policy where 

1. Data objects are first grouped and then allocated. 

2. There is no explicit limit placed on the number of data objects (or 
groups) allocated to any node. 

3. Each group has the same average degree of replication. 

4. The copies of a group are distributed in some systematic manner 
among the nodes in the system. 

5. All nodes in the system have identical reliability characteristics. 

With these five parameters, we can describe thirty two basic policies. 
Several variations of these basic schemes are possible due to variations in 
systematic distributions (D 2 ), variations on the limits of data objects per 
node (fl 2 ), and the types of grouping (A 2 ). Due to space limitations, in this 
paper we chose to present the results for six of these policies. Interested 
reader may refer to [14] for an analysis of other policies. 

5 That is, the underlying network structure almost always facilitates communication 
among live nodes. 
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We chose the following six policies to study the effect of the above men- 
tioned parameters on availability computations: 

Model 1: < A x , B U C U D U E x > 

Model 2: < A 2 , D\, E\ > 

Model 3: < A u B<i, C u D\, E x > 

Model 4: < A\, £ 1 , 62 , D\, E\ > 

Model 5: < A\ , B\ , C\ , D 2 , E\ > 

Model 6 : < Au Ci, D\, E 2 > 

Among these, Model 1 represents a simple system that is computation- 
ally attractive (as shown in Table 2). Model 2 reflects the effect of data 
grouping on the evaluation. Similarly, Model 3 reflects the effect of placing 
limits on number of data objects. Model 4 represents the effect of varia- 
tions in number of copies of data objects on availability evaluation. Model 
5 shows the effect of biased or non-random distributions of data objects 
on the evaluation. Finally, Model 6 reflects the effect of non-homogeneous 
environment (i.e., different node reliability characteristics) on transaction 
availability evaluation. 

In the following section, we derive closed-form expressions for the average 
transaction availabilities for Models 1 and 2. 


4 Probabilistic Computation of the Availabilities 

There are several approaches for computing the availability of a given trans- 
action in a database. These computations assume a given data distribution, 
data replication, and fault models. We now look at two such methods, 
simulation and probabilistic analysis. 

Using simulation, one can generate the data distribution matrix ( GD ) 
based on the data distribution and replication model. One can also generate 
the reliabilities for each of the nodes in the system 6 . Similarly, one can gen- 
erate all possible transactions (with different read-sets and write-sets) that 

6 Here, we ignore the possibility of network partitioning, and thereby ignore link relia- 
bility factor. 


7 


can be received at each of the nodes in the network. For each such transac- 
tion received by the system, the data distribution matrix can be searched, 
and its ability to access all the required data objects may be verified. In 
addition to generating transactions, we should also generate node failures 
and node repairs in the time domain. Thus, some transactions may not be 
successful due to the inaccessibility of one or more data objects that they re- 
quire (due to node failures). With such statistics (of successful/unsuccessful 
transactions) in hand, we can obtain the average availability of a transaction 
of a given size. This average corresponds to a single distribution matrix. The 
generation and evaluation process may have to be repeated sufficient times 
to get the required confidence in the final result. Since there are d data ob- 
jects, there are ( d ) possible transactions with read-set ' size s, and there are 
n nodes where each of these may be received. Given a transaction , and the 
node where it is received, determining the state (successful/unsuccessful) of 
a transaction takes at least O(nd) computations (i.e., to scan the columns of 
the GD matrix corresponding to available nodes). If the distribution matrix 
is generated k times, then the evaluation of the desired average set size for a 
transaction of size s takes 0(kn 2 d( d )) time. In general, k is a function of the 
number of copies, the number of data objects, the number of nodes, and the 
data distribution model, and it could be very high. Suppose d = 100, s = 10, 
and n = 10, then this method requires approximately 10 l ‘k computations. 
Even for reasonable values of k, this is an unreasonably high computation 
time. 

To avoid this large evaluation time, we adopt probabilistic analysis. In 
this analysis, we essentially study the given data distribution and reliability 
model and arrive at an expression for the average transaction availability 
for a given read-set (or write-set) size. With probabilistic analysis, some 
data distribution models (e.g., Models 1 and 3) may require insignificant 
amounts of computation. Some may need moderate computation times (e.g., 
Models 2 and 6), whereas others may need large computation times (e.g., 
Models 4 and 5). Regardless of the model, all these need considerably less 
computation time (with more accuracy of results) than the corresponding 
simulation methods. 

We now illustrate the probabilistic method of analysis by applying it for 

7 The corresponding term for write-sets of update transactions may be easily written. 
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Models 1 and 2. Expressions for other models may be derived in a similar 
manner. Interested reader may find the details of these derivations in [14]. 

4.1 Derivation of Reliability Metrics for Model 1 

Model 1, designated as < A\,B\,C\,D\,E\ > assumes the following about 
the data distribution and replication: 

[Rl] The data objects are allocated individually (i.e. not grouped) to the 
nodes. 

[R2] There are no limits placed on the number of data objects that may be 
placed at each node. 

[R3] The average degree of replication (c) of a data object is given. 

[R4] The copies of a data object are allocated randomly. 

[R5] Each node in the system has identical reliability (= r). 

Further, to simplify the illustration of the current analysis, we make the 
following assumptions regarding the distribution of groups, and the partici- 
pating node set determination: 

[R6] Each transaction is equally likely to access any data object. 

[R7] The transactions that enter the distributed system are coordinated 
by a set of reliable servers that search the distributed database system 
(i.e., the availability of nodes and their dictionaries) for the availability 
of the required data objects. 

Due to Rule R7, we will not distinguish transactions that are received 
at different locations in the system. Thus, we will disregard the originating 
node as a parameter in this analysis 8 . 

®The analysis can easily be extended to a situation where transactions received at an 
unavailable node are automatically considered as unsuccessful. 
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4.1.1 Derivation of Availability for Read-only Transactions 


Let us consider a read-only transaction T\ with s objects in its read-set and 
received at one of the servers. Let us also assume that the copy control 
algorithm follows a read-one/write-all policy. Thus T\ needs to access any 
one of the c copies of a data object that it requires. 

Given that exactly k of the n nodes are available (i.e., up), the probability 
that at least one copy of a given data object is available is given by: 

( n : fc ) 


Pk, i = i- 


O 


(i) 


By definition of the read-one/write-all policy, Pjt, l represents the probability 
that a data object is available for read access in the system. Since each data 
object is allocated independently to the nodes in the system (by Rules R1 
and R2), the probability that all s data objects required by T x are available 
for read access within these k nodes can then be expressed as: 


Pk,3 = 1 = 


1 - 


( n :‘) 


O 


(2) 


Assuming the reliability of any given node to be r (from Rule R5), the 
probability that 7j has successfully started is: 


TSA s 


n 


£ 


n 


£ 


? V l (l-r) n X 


n V(l -r) n ~ k 




o 


( 3 ) 


Given that 7\ has successfully started, we will now compute the prob- 
ability with which it can be successfully completed. Let us assume that n s 
nodes are involved in the execution of T\, and that it has an execution time 
of t units. Now, in order for T\ to be successful, all these n 3 nodes have to 
be available for at least t units of time, given that they were available at the 
start of execution. Assuming an exponential distribution for time between 
node failures with a failure rate of A, the probability that a node which is 
available at time zero is available throughout time t is given by: 


A t 


e 


-tA 


( 4 ) 
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From here, the probability that none of the n, nodes have failed during time 
t is given by: 

TFA, = A n t ’ 

_ g - n , ( A (5) 

Estimating n, for transaction T\ is a complex problem. This problem has 
been well investigated and the details of the solutions may be found in [15]. 
In this paper, we assume that n, for T\ has been obtained a priori for a 
given data distribution and fault model. 


4.1.2 Derivation of Availability for Read-write Transactions 


Let us now consider a read-write transaction T 2 with s objects in its read- 
set and y objects in its write-set. Let us assume that for a given read-write 
transaction write-setCread-set [3,7]. Thus, among the s data objects, y 
objects are both read and written, while x = s - y data objects are only- 
read. (Note that the intersection of the read-only and the read- write sets of 
the data objects is empty.) Since the replication control algorithm follows a 
read-one/ write- all policy, T 2 needs to access all c copies of the y data objects 
and any one copy of the x data objects. 

Given that exactly k of the n nodes are available (i.e., up), the probability 
that all c copies of a given data object are available is given by: 


H. 1 


0 

(") 


( 6 ) 


Since each data object is allocated independently to the nodes in the system 
(by Rules R1 and R2), the probability that all y data objects required by 
T 2 are accessible for update is expressed as: 


Similarly, the probability that all x data objects are available for read access 
may be computed as: 



Pk,x = 


(n-fc) 

1 _ 

C) 


( 3 ) 


11 


From here, the probability that T 2 is successfully started may be computed 
as: 


TSA[ 


n-k 


*,y 


= £(:ko-o 

r fc (l — r) n ~ k 


PLyP k , X 


k=\ 


= £ 

Ar=l 


n 

, k , 


(c) 


O 


y r 


ct ) 


o 


(9) 


The finish availabilities for T 2 may be similarly computed using Equa- 
tions (4) and (5) where n 3 is now replaced by n x , y [14]. 


4.1.3 Derivation of Availability for Transactions with Majority 
Consensus 


In the above two sections, we dealt with read-one/ write- all replication con- 
trol policy. The majority consensus protocols [18] which require the acces- 
sibility of at least a majority of the total copies of a data object for both 
read and write operations are very attractive in a failure prone environment. 
Since both read and write operations require the same number of copies of 
a data object, in this analysis we do not distinguish between read-only and 
update transactions. Here, we simply refer to T[ as a transaction. 

L et m - [£+i] represent the majority of copies. Then the expression for 
start availability for T\ is given as: 


tsa'I = £ U] r *(i- r ) 


n — k 


k—T 


£ 

L/=m 


(ft (::?)! 


o 


(10) 


Similarly, the expression for the finish availability for Ti may be expressed 


as: 


TFA, = At‘ 

= e -n,tA (11) 

where n s now represents the average number of nodes accessed for executing 
T\ with the majority consensus protocol [15]. 
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4.2 Derivation of Transaction Availability for Model 2 

Model 2, designated as < D\, E\ > is similar to Model 1, except 

that the data objects are now grouped, and the groups are then allocated 
to nodes in the system. This may be described as: 

[R9] The data objects are first grouped and the groups are then allocated, 
to the nodes. Let the d data objects be partitioned into t distinct 
groups. Let d ^ represent the number of data objects in group k. Thus, 

EU d, = d. 

[RIO] There are no limits placed on the number of groups that may be placed 
at each node. 

[Rli] The degree of replication is the same for each group (c). 

[R12] The copies of a group are allocated randomly. 

[R13] Each node in the system has identical reliability (r). 

Again, to simplify analysis, we make the following assumptions: 

[R14] Each transaction is equally likely to access any data object. 

[R15] The transactions that enter the distributed system are coordinated 
by a set of reliable servers that search the distributed database system 
(i.e., the availability of nodes and their dictionaries) for the availability 
of required data objects. 

4.2.1 Derivation of Availability for Read-only Transactions 

Once again let us consider transaction T x executing under a read-one/write- 
all policy. Given that k of the n nodes are available (i.e., up), the probability 
that at least one copy of group k is available is given by: 


If the vector GA =< a x , a 2 , . . . , a t > represents the number of data objects 
accessed by T x from each of the t groups, then the probability that T x is 
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successfully started may be computed as: 


TSA, = 
Pr(GA) = 

m = 

GA = 


GA 1=1 

v at / 


k= 1 L 


-M 


1 - 


/=1 

/Jiwd 2 ^ 

vai / 

(5 

f 1 if <2fc > 0 
| 0 otherwise 

< a \ , a 2i • * • 7 a t >i 
t 

^ at = 5 and VA; 1 < A; < 2 0 < 

kzzl 


(T) 

C) 


/(fc) 


(13) 

(14) 

(15) 

(16) 


When data objects are equally distributed among the groups (i.e., d x = d 2 = 
. .. = d t = j), then this expression may be further simplified as: 

(ill £_fc if) 
C) J (5 
(!') 


■ ss("> ,(i - rr 'C) 


( n -') 
i - v c — 




C) 


The expression for TF A s is the same as in Equation (5). 

4.2.2 Derivation of Availability for Read-write Update Transac- 
tions 

Let us consider transaction T 2 which requires x objects for read-only oper- 
ations and y data objects for read and write operations (s = x + y). Thus 
we need to define two GA vectors for read-only and read-write data object 
sets: 


GA' 

— < y a>2 1 • • 


> 




E a ' k = 1 

and 

VJfc 1 < k < t 0 < a' 

< 

-V 

^3 







GA" 

= < , a 2 , . . 

n 

.,a f 

> 




£ = 2/ 

and 

VJfc 1 < k < t 0 < a" 

< 

dk ~ a'k 


fc=i 
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In computing TSA' xy we should recall that if a data object is write 
accessible under a given node availability conditions, it is also read accessible. 
However the reverse is not true. These two facts are made use of in deriving 
the following expression for TSA' xy ‘- 


TSA' 


i,y 


Pr(GA') 

Pr(GA ") 

/'(*) 

/"(*) 


£ £ Pr(GA')Pr(GA")± "W-tO 


GA’GA" 


n 

k=l L 


1 - 


O) 


O 


/'(*) 


TT — 

L\ to J 


im f" {k) 


q)(ap •••(:;) 

0 


( d 7) 



if aj,' = 0 A a' k > 0 
otherwise 

if a k > 0 
otherwise 


(18) 


As before, when data objects are equally distributed among the groups 
(i.e. di = ^2 = • • • = dt = f ), this expression may be simplified as: 


TSA' 


r,y 


n t t — ki 

EE E 

J = 1 fci=l k2 =0 


1 - 


(O + 0 

a 


|r'(I - r) n 



[0 1 

[o 


1^ 

[0)1 

(f ) I**? 1 -) 


0 \ 

0 O) 


(19) 

The finish availability TFA x<y may be computed using Equation (5) 
where n s is now replaced by n Xyy which is assumed to be known a priori in 
this paper. 
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4.2.3 Derivation of Availability for Transactions with Majority 
Consensus 


As described in Section 4.1.3, under the majority consensus protocol both 
the read-set and read-write set are treated in the same way for access prob- 
ability computations. Thus, we only consider a read-only transaction with 
a read-set size of s. The expression for TSA" can now be written as: 


TSA", = £PKGA)£("W-rr'fl 

GA /=1 \ / 

~c + 

m = 


E 

r = m 


C.K:,V nk> 


o 


( 20 ) 

( 21 ) 


where Pr(GA) and f(k) are as defined in Equations (14) and (15). 

Once again, when data objects are equally distributed among the groups 
(i.e. d\ = d 2 = . . . = d t = f), this expression may be written as: 


n t 


TSA>; = EE ? K(l-r) 


l=zm k=\ 


n-l 

i k i 


min(/,c) (l \ / n — l \ 

/ 1 =771 


C) 


min(/,c) / / \ / n — / \ 

( U ) 

/i = m 


t-k 


(J ) 

0 


(22) 


5 Comparison of the Availabilities for the Six 
Models 


As mentioned in the introduction, the main objective of this paper is to 
determine the effect of data distribution, replication, and fault models on 
the estimation of transaction availability. To achieve this, we evaluate the 
desired measure using six different models. The comparison of these evalua- 
tions is based on computational time, storage requirement, and the average 
values obtained. 

Due to space limitations, we cannot present the detailed derivations for 
the average values for Models 3-6. The final expressions, however, are sum- 
marized in the appendix. 
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5.1 Computational Complexity 

We now analyze each of the evaluation methods (for Models 1-6) for their 
computational complexity. 

• Let us refer to Model 1. From Equations (3) and (9), it is clear 
that computation of TSA, and TSA X:V take 0(cn 2 ) time 9 . Simi- 
larly, from Equation (10), it is clear that the computation of TSA" 
requires 0(c 2 n 2 ) time. 

• We now derive this complexity term for Model 2. Let us first look 
at the computation of TSA,. From Equation (14), we derive that 
the computation of Pr(GA) requires O(s) time. The number of GAs 
generated is approximately 0(s t ) where t represents the number of 
data object groups. Given a GA vector and Pr(GA), computation of 
TSA, requires 0(nct + n 2 ) arithmetic operations (from Equation (IS)). 
Thus the evaluation of TSA, requires 0(s‘ (net + n 2 + s)) time. Sim- 
ilarly, we can conclude that TSA' xy requires 0(x t y t (nct + n 2 + s)) 
time (Equation (19)), and TSA " requires 0(s t (nc 2 t + n 2 + s)) time 
(Equation (20)). 

• For Model 3, the computational complexity for TSA, is 0(n 2 +n(s+c)) 
(Equation (23)). Similarly, TSA' xy and TSA" require 0(n 2 + n(c + s)) 
and 0(n 2 + n(c 2 + $)) respectively (Equations (24) and (25)). 

• The computational complexity for Model 4 depends on the number 
of copy categories. Assuming that s < djt for k = l,2,...p, we can 
generate approximately s p different C A vectors. Thus the computation 
of TSA, requires 0(s p (n 2 + npc+ s)) time. To compute TSA', we need 
to compute the number of possible CA' and C A vectors. There are 
approximately x p CA' vectors and y p CA!' vectors. Thus, TSA X y 
requires 0(x p y p (npc + n 2 + s)) time. Similarly, we can conclude that 
TSA 1 ' requires 0(s p (npc 2 + n 2 + s)). 

• In Model 5, we assume that the entire data dictionary information 
is available to us. Given a GD matrix and a node status vector S , 

9 Here, we are assuming that the evaluation of the terms (jj) and p 9 takes O(q) and 
0(1) time respectively. 
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computation of f{S), /'(5), and /"(S) require 0{nd) time to search 
the matrix. Given n, there are 2” possible 5 vectors. Thus the com- 
putations of TSA, TSA’, and TSA" require 0(2 n (nd + 5 )) time. 

• In Model 6, the number of NA vectors generated is (n! +^l)(n 2 + 
1) . . . (n 9 + 1). For simlification, we approximate it as ^ + l) . Given 
a NA vector, the computation of TSA, TSA', and TSA" require 0(5 + 
c+q) , 0(s + c+ g) and 0{sc+c 2 + qc ) time respectively. Thus the three 
metric evaluations require 0((^ + l^s + c + g)), 0(( ^ + l) , (s + c + g)), 
and 0((2 + l^cs + c 2 + cq )) time respectively. 

These complexities are summarized in Table 2. From this table it may be 
observed that models 1 and 2 are computationally very attractive. The 
complexity of evaluations with models 2,4, and 6 depend on the number of 
groups, the number of copy variations, and the number of reliability vari- 
ations respectively. For systems with a large number of nodes, evaluations 
with model 5 are very expensive. 

5.2 Space Complexity 

We now discuss the space complexity for the six models: 

• Models 1 and 3 just require the values of d,c,s,r and n. Thus the 
storage requirement is 0(1) 

• Since Model 2 requires that the d, values be stored, and that the GA 
vectors be generated, it requires 0(f) storage, where t is the number 
of data groups. 

• Model 4 requires 0(p) storage to contain the p copy classes. 

• Model 5 requires 0(nd) storage for the GD matrix. 

• Model 6 requires 0{q) storage to contain the node reliability class 
information. 

Thus, Model 5 has the largest storage requirement. These complexities are 
summarized in Table 3. 
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Model 

Computational Complexity 


Read-only 

Read-write 

Majority 

1 

0(cn 2 ) 

0(cn 2 ) 

0(c 2 n 2 ) 

2 

0(s t (nct + n 2 + s)) 

0(x t y t (nct + n 2 + 5 )) 

0(s‘(nc 2 t + n 2 + s)) 

3 

0(n 2 + nc + ns) 

0(n 2 + nc + ns) 

0(n 2 + nc 2 + ns) 

4 

0(s p (npc + n 2 + s)) 

0(x p y p (npc + n 2 + s )) 

0(s p (npc 2 + n 2 + $)) 

5 

0{2 n (nd + s)) 

0(2 n (nd + s)) 

0(2 n (nd + s)) 

6 

0((£ + !)*(* + c + <?)) 

0((f + l)i{s + c + q)) 

0((£ + 1)?(C5 + C 2 + cq)) 


Table 2: Computational Complexities for the Evaluation of Availabilities 


Model 

Space 

Complexity 

1 

0(1) 

2 

0(0 

3 

0(1) 

4 

0{p) 

5 

0(nd) 

6 

0(q) 


Table 3: Space Complexities for the Evaluation of Availabilities 
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5.3 Comparison of the Availabilities 

In order to compare the effectiveness of each of these models, we have evalu- 
ated availabilities for a wide range of parameters. Due to space limitations, 
in this paper, we only present a small subset of these results. Similarly, 
since TFA S , TFA' iy , and TFA " are found to be insensitive to variations 
in models, we are not presenting these results here. We only present the 
results for the transaction start availabilities. These results are summarized 
in Figures 1-7. 

Figures 1-3 compare the availabilities obtained from the six models. The 
following assumptions are made for models 1-6: 

1. In Model 2, we assume that the d data objects are grouped into n 
data groups each containing d/n data objects. This is similar to the 
assumptions in [13]. 

2. In Model 3, we assume that each of the n nodes in the system is 
allocated exactly the same number of data objects (equal to dc/n). 

3. In Model 4, we assume that d/2 data objects have c copies, d/4 data 
objects have c 4- 1 copies, and the rest have c — 1 copies. This keeps 
the average copies the same (i.e., c) but brings a copy variation factor 
into consideration. 

4. In Model 5, we assume that the d data objects are allocated system- 
atically so that the copies of the i th data object are allocated, in a 
circular manner, to the nodes starting from (t$n)+ 1. 

5. In Model 6, we assume that n/ 3 nodes have reliability r - 0.1, n/3 
have reliability r + 0.1 and the rest have a reliability r. 10 

Figure 1 summarizes the results for read-only transactions with read-one/write- 
all policy. Figure 2 presents these results for transactions (read-only or 
read-write) with majority-read/majority- write protocol. Finally, Figure 3 
summarizes the results for read-write transactions with read-one/write-all 
policy. From these results, we make the following observations: 

10 When r = 0.95, we assume that n/3 nodes have reliability r — 0.5, n/3 have reliability 
t + 0.05 and the rest have a reliability r. 
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• For read-only transactions (with read-one/write-all policy), 

(i) Evaluations with models 1 and 3 are close over the entire range 
of s and r. 

(ii) Evaluations with models 2 and 5 are also close over the entire 
range of s and r. This may be explained by the fact that the 
number of groups g = n = 10 for model 2 and the systematic 
distribution for model 5 implicitly results in 10 groups. However, 
they do differ in the manner in which these groups are distributed. 

(iii) For r > 0.95, evaluations with all models, excepting model 4, are 
quite close. 

(iv) Evaluations with model 4 appear to significantly deviate from 
all other models for r > 0.75. This implies that modeling of 
the degree of replication is a very important task in availability 
evaluations. 

• For transactions with majority-read/majority-write policy, 

(v) Evaluations with models 1 and 3 appear to be close. Similarly, 
evaluations with models 2 and 5 are close. In addition, evalua- 
tions with model 6 are close to evaluations with models 1 and 
3. 

(vi) For s > 25, the availabilities appear to be independent of the 
read-set size. This implies that computations for s > 25 are 
redundant. 

(vii) The evaluations with models 2 and 5 seem to differ at higher 
values of n. The evaluations with the other four models are close 
for n = 20. This is an interesting observation. 

(viii) Once again, the variations in degree of replication of individual 
data objects appears to have a dominating effect on availability 
evaluations. 

• For read-write transactions with read-one/write-all policy, 

(be) The availabilities for s > 5 are significant only when r > 0.99. 
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(x) Since the availabilities are generally low, the effect of the differ- 
ences in the models seem to be insignificant. At high reliabilities 
(i.e. r > 0.99), the evaluations with model 4 seem to deviate 
from the evaluations with the other models. 

We will now study the effect of the individual model parameters. 

• Models 1 and 3 are very simple, and need no further investigation. 

• Evaluations with model 2 represent the effect of data object group- 
ing on availability (Figure 4). As the number of groups is increased, 
the availability seems to be decreasing. This effect seems to dimin- 
ish for g > 25. This effect is insignificant for read-write transactions. 
Similarly, this effect seems to vanish at high node reliabilities. 

• Evaluations with model 4 represent the effect of variations in degrees 
of replication of data objects (Figure 5). The effect of these varia- 
tions seem to be insignificant on read-write transactions. The effect 
of copy variations seem to be more apparent at high node reliabilities. 
Similarly, this effect seems to be more pronounced on read-only trans- 
actions (with read-one/ write- all policy) than the other two classes. 

• Model 5 represents the effect of data distribution on the availability 
evaluations. From Figure 6, it may be observed that the distribution 
effect is only evident at s > 10. In addition, the effects are more 
significant for read-only transactions than the other two classes. The 
effect is less evident at high node reliabilities. 

• Model 6 represents the effect of node reliability variations on avail- 
abilities. From Figure 7, it may be observed that the variations have 
almost no effect on availability evaluations. 

6 Conclusions 

The current investigations on measuring the effect of data distribution, repli- 
cation, and fault models on transaction availability evaluation have resulted 
in some very interesting observations. As part of this study, we chose six 
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models representing six different parametric assumptions that researchers 
and designers generally tend to make in their analysis. Using probabilis- 
tic analysis, we derived expressions for transaction availability for three 
classes of transactions: read-only (read-one/write-all policy), transactions 
with majority-read/majority-write policy, and read-write transactions (with 
read-one/write- all policy). The effect of the six parameters is measured by 
evaluating availabilities (for different read-set sizes). From here, we conclude 
that: 

• By choosing a proper distributed database model, the computational 
complexity of transaction availability evaluations can be significantly 
reduced. 

• For values of s < 10, all models result in almost the same transaction 
evaluation. 

• It is not necessary to evaluate transaction availabilities for values of 
s > 25. 

t Evaluations for the read-only transactions (with read-one/ write-all 
policy) are more sensitive to database modeling than the other two 
classes of transactions. 

• The degree of replication of individual (or group) data objects seems 
to have a significant effect on transaction availabilities. Thus, when 
different data objects have different copies, adopting average degree 
of replication to represent ant object in a system, may not result in 
accurate availability evaluations. 

• The actual distribution of data object copies has some, if not signifi- 
cant, impact on availability evaluation. 

• In a heterogeneous environment where different nodes may have dif- 
ferent reliabilities, it is sufficient to represent each node by the average 
node reliability, without affecting the availability evaluations. 

• Data object grouping (logical or physical) does not seem to effect the 
accuracy of availability evaluations as long as the number of groups is 
not too small (e.g. When d = 1000, g > 25 is sufficient). 
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Distributed database designers and researchers can utilize these results in 
choosing appropriate parameters that would result in reduced computational 
requirements without sacrificing the resulting accuracy of the design and 
analysis of these systems. 
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Appendix 


Model 3 < A u B it C u Di,Ei >: 

Here, we assume that each node has exactly the same number of data objects 

(=*r)- 


TSA, = 
TSA' x< y = 

TSA " = 

TFA S = 
TFA' x<y = 
tpa;' = 

Xk - 
Vk = 


Zk = 


m = 
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(24) 

(25) 

(26) 

(27) 

(28) 


Model 4 < A\, BiyC 2 , D\, E\ >: 

Here, each data object may have its own degree of replication specified. 
For an efficient computation, we classify the data objects into p categories 
(1 < P < n ) based its degree of replication. d\ denoted the number of data 
objects in the I th category where each object has q (1 < c/ < n) copies. 
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The expressions for TF A s , TFA xyi and TF A" are the same as in Equa- 
tions (26) - (28). 


Model 5 < AuB\jC\,D 2 ,Ei >: 

Here, we assume that the entire data distribution is available as a dictionary, 
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( 34 ) 


tsa " = J2 Pr ^hk- 

S v 5/ 

Pr(S) = r / "' (S) (l - r) n ' r ' (S) 

where 

S - Node status vector; Sj = 1 => Node j is up; = 0 => Node j is down. 
f(S) - The number of data objects available for read with the given node 
status vector (S). This is computed by scanning the columns of the GD 
matrix corresponding to the live nodes (as given by S ). 

/'(S) - The number of data objects available for update (i.e. all c copies 
of these data objects are available at the live nodes) with the given node 
status vector (S). This is also computed by scanning the columns of the GD 
matrix corresponding to the live nodes (as given by S). 
f"(S) - The number of data objects available with a majority of copies 
among the available nodes. As before this is computed by scanning the 
columns of the GD matrix corresponding to the live nodes (as given by 5). 
f"{S) - The number of nodes available (or up) as indicated by the vector 

5. 


Model 6 < A\,B\,C\,D\,Ev >• 

Here each node may have its own reliability. For computational purpose, we 
categorize the nodes based on their reliability. We assume that there are q 
(1 < q < n) such categories. W T e let n, to represent the number of nodes 
with reliability r and a, to represent the number of currently active (or up) 
nodes with this reliability. 
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Figure la. n=10, d=1000, c=3, r=0.4 


Figure lb. n=10, d=1000, c=3, r=0.75 




Figure lc. n=10, d=1000, c=3, r=0.90 



Figure 1. Transaction Start Availabilities for Read-Only 


Figure Id. n=10, d=10000, c=3, r=0.75 
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Figure lg. n=10, d=1000, c=3, r=0.95 



Figure 1 (Continued). Transaction Start Availabilities for Read-only Transactions (Read-one/Wri te-all Policy) 








Figure 2a. n=10 > d=1000, c=3, r=0.4 


Figure 2b. n=10 t d=1000,c=3 > r=0.75 
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Figure 2c. n=10, d=1000, c=3, r=0.90 

Figure 2d. n=10, d=10000, c=3, r=0.75 
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Figure 2. Transaction Start Availabilities with Read-Majority/Write-Majority Protocol 


























Figure 3f. n=20, d=1000, c=3* r=0.90 


Figure 3e. n=10, d=1000, c=5, r=(X90 




Figure 3 (Contd.) . Transaction Start Availabilities for Read-write Transactions (with Read-one/Write-all Policy) 





































Figure 7b. n=10, d=1000, c=3, avg. r=0.75 


Figure 7a. n=10, d=1000, c-3, avg. r=0.50 




F igure 7d. n=10, d=1000, c=3, avg. r=0.9 5 
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Figure 7. Illustration of the Effect of Reliability Variations on Availability (Model 6) 
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